docs: target Context7 benchmark gaps in Python skills [no-ci] by lukeocodes · Pull Request #699 · deepgram/deepgram-python-sdk

lukeocodes · 2026-04-27T14:13:21Z

Summary

Closes the four largest gaps in the Context7 benchmark for /deepgram/deepgram-python-sdk. Current score: 88.8/100 (mean across 10 standardized prompts). The 4 weakest prompts account for ~97 of the 112 missing points; this PR addresses each one specifically.

What's broken (Context7 evaluator quotes)

#	Prompt	Score	What's missing
1	Voice agent dynamic adjustment + stream restart/pause	66	"lacks specific guidance or API methods for dynamically adjusting transcription parameters during an active connection or for intelligently managing stream restarts and pauses beyond basic error events"
2	Live streaming with interim results display	71	"all examples show `interim_results=False`, which is the opposite of what's needed, and none demonstrate how to differentiate between interim and final results or how to handle the display logic"
5	Diarization + word-level timings combined	83	"lacks a specific, complete code example showing how to enable both diarization and word-level timings together in a single request"
8	Async URL transcription + retrieve final result	83	"lacks critical information about handling asynchronous results — doesn't explain how to retrieve the final transcription when using async methods or how to poll for results"

Changes

`deepgram-python-voice-agent/SKILL.md` (+139 lines, prompt #1)

New "Dynamic mid-session adjustment" section — runnable code for every control message exposed by the Agent socket client:
- send_update_prompt(AgentV1UpdatePrompt) — swap LLM system prompt mid-conversation
- send_update_speak(AgentV1UpdateSpeak) — swap TTS voice
- send_update_think(AgentV1UpdateThink) — swap LLM provider/model
- send_inject_agent_message(...) — force agent to say something
- send_inject_user_message(...) — inject user input
- send_keep_alive(...) — idle keep-alive
- Server reply event names noted for each (PromptUpdated, SpeakUpdated, ThinkUpdated, InjectionRefused)
- Async equivalents
New "Stream lifecycle & recovery" section — KeepAlive loop on idle, pause/resume audio, reconnect after disconnect with conversation history replay via AgentV1SettingsAgentContext, EventType.CLOSE / EventType.ERROR handling

`deepgram-python-speech-to-text/SKILL.md` (+103 lines, prompts #2 + #8)

Prompt #2:

Rewrote the WebSocket quick-start to pass interim_results=True, utterance_end_ms=1000, vad_events=True
Real overwrite-line display pattern showing interim results live and committing the line on final
New "Interim vs. final flag semantics" subsection on is_final, speech_final, from_finalize distinctions

Prompt #8:

New "Async / deferred result patterns" section explicitly distinguishing Python async/await (sync-style, immediate result via AsyncDeepgramClient) from deferred via callback URL (returns request_id immediately, results POST'd to webhook later — no polling)
Decision table mapping each pattern to when to use it
Pointer to examples/12-transcription-prerecorded-callback.py

`deepgram-python-audio-intelligence/SKILL.md` (+41 lines, prompt #5)

New "Quick start — diarization with word-level timings" section
One focused snippet: diarize=True, smart_format=True, punctuate=True + per-word iteration accessing speaker, start, end, confidence, punctuated_word
Per-word fields table
groupby-by-speaker utterance pattern + pointer to utterances=True / paragraphs=True for pre-grouped views

Expected lift

If every gap closes:

Prompt 1: 66 → ~95 (+29)
Prompt 2: 71 → ~95 (+24)
Prompt 5: 83 → ~95 (+12)
Prompt 8: 83 → ~95 (+12)

Total potential: +77 / 1000 (across 10 prompts) = 88.8 → ~96.5 benchmark score.

After merge

Trigger Context7 refresh on /deepgram/deepgram-python-sdk to pull the new content into the index, then re-run the benchmark to verify the lift.

The Context7 benchmark for /deepgram/deepgram-python-sdk scores the SDK against 10 standardized prompts (rubric: implementation 40 + accuracy 25 + relevance 20 + completeness 10 + clarity 5 = 100). Current score: 88.8. Four prompts had the largest gaps: Prompt #1 (66/100) - Voice agent dynamic adjustment + stream restart Eval said the skill 'lacks specific guidance or API methods for dynamically adjusting transcription parameters during an active connection or for intelligently managing stream restarts and pauses beyond basic error events'. deepgram-python-voice-agent/SKILL.md: - New 'Dynamic mid-session adjustment' section with runnable code for send_update_prompt, send_update_speak, send_update_think, send_inject_agent_message, send_inject_user_message, send_keep_alive (sync + async equivalents). - New 'Stream lifecycle & recovery' section covering KeepAlive on idle, pause/resume audio, reconnect after disconnect with conversation history replay via AgentV1SettingsAgentContext, and EventType.CLOSE / EventType.ERROR handling guidance. Prompt #2 (71/100) - Live streaming with interim results display Eval said 'all examples show interim_results=False, which is the opposite of what's needed, and none demonstrate how to differentiate between interim and final results or how to handle the display logic'. deepgram-python-speech-to-text/SKILL.md: - Rewrote the WebSocket quick-start to pass interim_results=True, utterance_end_ms=1000, vad_events=True, with a real overwrite-line pattern that shows interim results live and commits the line on final. - Added an 'Interim vs. final flag semantics' subsection explaining is_final, speech_final, and from_finalize distinctions and when each fires. Prompt #5 (83/100) - Diarization + word-level timings combined Eval said the skill 'lacks a specific, complete code example showing how to enable both diarization and word-level timings together in a single request'. deepgram-python-audio-intelligence/SKILL.md: - New 'Quick start - diarization with word-level timings' section: one focused snippet enabling diarize=True with per-word iteration showing speaker, start, end, confidence, punctuated_word. - Added a per-word fields table (word, punctuated_word, start, end, confidence, speaker, speaker_confidence) plus a groupby-by-speaker pattern and pointers to utterances=True / paragraphs=True for pre-grouped views. Prompt #8 (83/100) - Async URL transcription + retrieve final result Eval said the skill 'lacks critical information about handling asynchronous results - while it mentions callback functionality, it doesn't explain how to retrieve the final transcription when using async methods or how to poll for results'. deepgram-python-speech-to-text/SKILL.md: - New 'Async / deferred result patterns' section explicitly distinguishing Python async/await (sync-style, immediate result via AsyncDeepgramClient) from deferred via callback URL (returns request_id immediately, results POST'd to webhook later, no polling). - Decision table mapping each pattern to when to use it, with pointer to examples/12-transcription-prerecorded-callback.py. Net: +276 lines targeting ~97 missing benchmark points (potential lift 88.8 -> ~98 once Context7 reindexes).

Copilot

Pull request overview

This PR updates the in-repo Context7 “skills” documentation for the Deepgram Python SDK to address several benchmark prompt gaps, primarily by adding more complete, runnable examples and clarifying behavioral semantics (interim vs final streaming, mid-session agent updates, diarization + word timings, and async patterns).

Changes:

Added mid-session voice agent control-message examples (prompt/think/speak updates, message injection, keep-alives) and reconnection/context replay guidance.
Reworked live WebSocket transcription quick-start to demonstrate interim_results=True with clear interim-vs-final display handling and clarified result flags.
Added a focused diarization + per-word timing quick-start and expanded async/deferred transcription guidance for prerecorded URL transcription.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`.agents/skills/deepgram-python-voice-agent/SKILL.md`	Adds dynamic mid-session update examples and stream lifecycle/recovery guidance for Agent V1.
`.agents/skills/deepgram-python-speech-to-text/SKILL.md`	Updates live streaming quick-start for interim results and adds async/deferred result patterns + flag semantics.
`.agents/skills/deepgram-python-audio-intelligence/SKILL.md`	Adds a diarization + word-level timings quick-start and per-word field reference table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… feedback) Both Copilot threads on PR #699: - deepgram-python-speech-to-text/SKILL.md interim-results snippet used `global last_interim_len` but the variable was defined in the enclosing `with` block, not at module scope. That would raise NameError on the first read. Replaced with a mutable closure (`state = {...}` dict), which is the idiomatic pattern when a callback needs to mutate state inside a `with` block. - deepgram-python-voice-agent/SKILL.md said the server emits a 'History event (type agent_v1history)'. `agent_v1history` is the internal Python module/file name, not the wire `type` literal. The wire `type` is `"History"` and the Python class is `AgentV1History`. Reworded so readers don't pattern-match on the wrong identifier.

🤖 I have created a release *beep* *boop* --- ## [7.1.0](v7.0.0...v7.1.0) (2026-05-06) ### Features * update generated SDK models and restore agent settings compatibility ([#705](#705)) ([0b820c9](0b820c9)) ### Documentation * target Context7 benchmark gaps in Python skills [no-ci] ([#699](#699)) ([a232eb8](a232eb8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

lukeocodes requested review from GregHolmes and Copilot April 27, 2026 14:13

Copilot started reviewing on behalf of lukeocodes April 27, 2026 14:13 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread .agents/skills/deepgram-python-speech-to-text/SKILL.md Outdated

Comment thread .agents/skills/deepgram-python-voice-agent/SKILL.md Outdated

GregHolmes approved these changes Apr 27, 2026

View reviewed changes

lukeocodes merged commit a232eb8 into main Apr 27, 2026
10 checks passed

lukeocodes deleted the lo/ctx7-benchmark-lift-python branch April 27, 2026 16:14

github-actions Bot mentioned this pull request Apr 27, 2026

chore(main): release 7.1.0 #700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: target Context7 benchmark gaps in Python skills [no-ci]#699

docs: target Context7 benchmark gaps in Python skills [no-ci]#699
lukeocodes merged 2 commits intomainfrom
lo/ctx7-benchmark-lift-python

lukeocodes commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lukeocodes commented Apr 27, 2026

Summary

What's broken (Context7 evaluator quotes)

Changes

deepgram-python-voice-agent/SKILL.md (+139 lines, prompt #1)

deepgram-python-speech-to-text/SKILL.md (+103 lines, prompts #2 + #8)

deepgram-python-audio-intelligence/SKILL.md (+41 lines, prompt #5)

Expected lift

After merge

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`deepgram-python-voice-agent/SKILL.md` (+139 lines, prompt #1)

`deepgram-python-speech-to-text/SKILL.md` (+103 lines, prompts #2 + #8)

`deepgram-python-audio-intelligence/SKILL.md` (+41 lines, prompt #5)